Sankey Diagram in R

Introduction :

Sankey diagram is a type of flow diagram where the width of the arrows is proportional to the flow quantity. It is often used to visualize the flow of resources or information between different entities. In this lab, we’ll see a couple of R packages to generate a Sankey diagram.

Installing and loading the required packages

install.packages(dplyr)
install.packages(plotly)
install.packages(networkD3)
library(plotly)
library(networkD3)
library(dplyr)

Importing data

df<-read.csv("Sankeydata.csv")
head(df)
    id gender    field personality
1 PT_1 Female Business Introverted
2 PT_2   Male      Law Extroverted
3 PT_3   Male  Science Introverted
4 PT_4 Female      Art Introverted
5 PT_5 Female Business Extroverted
6 PT_6   Male  Science Introverted

Transforming Data

freq_table <- df %>% group_by(personality, field) %>% 
  summarise(n = n())
freq_table
# A tibble: 8 × 3
# Groups:   personality [2]
  personality field        n
  <chr>       <chr>    <int>
1 Extroverted Art          6
2 Extroverted Business    17
3 Extroverted Law          9
4 Extroverted Science      7
5 Introverted Art         16
6 Introverted Business    15
7 Introverted Law         13
8 Introverted Science     17

Breaking down the terminology used

  • Node: Nodes are your source and target points in a Sankey plot. They are represented by rectangles.
  • Link: Links connect nodes, depicting the flow/transition of entities from source to target categories. Their thickness depends on the quantity or frequency shifting categories.
  • Value: Values are the numerical values associated with links that indicate the frequency of entities moving from one category to another.

1. Sankey Plot using Plotly

link_color <- "rgba(0, 0, 0, 1)" 

p1<-plot_ly(type = "sankey",orientation = "h",
  node = list(label = nodes$name),
  link = list(source = links$source,
              target = links$target,
              value = links$value,color = link_color)) %>%
  layout(title = "Sankey Plot: Personality and Field")
library(webshot2)
htmlwidgets::saveWidget(widget = p1, file = "p1.html")
webshot(url = "p1.html", file = "p1.png", delay = 5)

NOTE: Image of above output has been added due to memory issues.

2. Sankey Plot using networkD3

sankey_plot <- sankeyNetwork(Links = links,Nodes = nodes,Source = "source",Target = "target",Value = "value",
                             NodeID = "name",fontSize = 12,nodeWidth = 30)
sankey_plot

Understanding above created Sankey Plots

In these Sankey plots, we can see the distribution of extroverted and introverted personalities in different fields like business, science, law, etc. One of our key takeaways from this plot can be that we get to see if any one field is preferred by a certain personality type - for example, from the above chart, more introverted people can be seen opting for art, as compared to extroverted people.

Conclusion :

In conclusion, Sankey diagrams represent a transformative force in data visualization, transcending traditional boundaries to deliver profound insights. Their versatility, ability to uncover patterns, and capacity to communicate complex information make them indispensable tools for decision-makers across industries. As we navigate the ever-expanding landscape of data, Sankey diagrams stand as beacons, guiding us toward a clearer, more intuitive understanding of the intricate relationships shaping our world.